.Rproj
Creating a project-oriented workflow in R
RProjects and the here package
Learning Objectives
Today we willβ¦
- learn about project-oriented workflows
- create an RProject
- use project-relative filepaths with the
herepackage
1 Installation requirements
- required installations/recent versions of:
- R
- at least version
4.4.0, βPuppy Cupβ - check current version with
R.version - download/update: https://cran.r-project.org/bin/macosx/
- at least version
- RStudio
- at least version
2023.12.1.402, βOcean Stormβ - Help > Check for updates
- new install: https://posit.co/download/rstudio-desktop/
- at least version
- R
2 Project-oriented workflow
- Folder structure:
- keeping everything related to a project in one place
- i.e., contained in a single folder, with subfolders as needed
- Project-relative working directory
- the project folder should act as your working directory
- all file paths should be relative to this folder
2.1 Folder structure
- a core computer literacy skill
- keep your Desktop as empty as possible
- have a sensible folder structure
- avoid mixing subfolders and files
- i.e., if a folder contains subfolders, ideally it should not contain files
3 RProjects
- in data analysis, using an IDE is beneficial
- e.g., RStudio
- most IDEs have their own implementation of a Project
- in RStudio, this is the RProject
- creates a
.Rprojfile in a project folder - stores project settings
- creates a
- you can have several RProjects open simultaneously
- and run several scripts across projects simultaneously
- most importantly, RProjects (can) centralise a specific projectβs workflow and file path
- to read more about R Projects, check out Section 6.2: Projects from Wickham et al. (2023; or Ch. 8 - Workflow: Projects in Wickham & Grolemund, 2016)
3.1 Creating a new Project
- when?
- whenever youβre starting a new course or project which will use R
- why?
- to keep all the relavent materials in one place
- where?
- somewhere that makes sense, e.g., a folder called
SoSe2024orMastersarbeit
- somewhere that makes sense, e.g., a folder called
- how?
File > New Project > New Directory > New Project > [Directory name] > Create Project
New RProject
Create a new RProject for this workshop
File > New Project > New Directory > New Project > [Directory name] > Create Project- make sure you choose a sensible location
3.2 Opening a Project
- to open a project, locate its
.Rprojfile and double-click - or if youβre already in RStudio, you can use the
Project (None)drop-down (top right)
3.3 Adding a README file
File > New File > Markdown File(not R Markdown!)- add some text describing the purpose of this project
- include your name, the date
- use Markdown formatting (e.g.,
#for headings,*italics*,**bold**)
- save as
README.mdin your project directory
3.4 Global RStudio options
Tools > Global Options- Workspace: Restore .RData into workspace at startup: NO
- Save workspace to .RData on exit: Never
- this will ensure that you are always starting with a clean slate
- and that your code is not dependent on some pacakge or object you created in another session
- this is also how RMarkdown and Quarto scripts run
- they start with an empty environment and run the script linearly
Global settings
Change your Global Options so that
- Workspace: Restore .RData into workspace at startup: NO
- Save workspace to .RData on exit: Never
3.5 Identifying your RProject
- there are a ways to check which (if any) RProject youβre in
4 Folder structure
- some folders youβll typically want to have:
data: containing your dataset(s)scripts(oranalyses, etc.): containing any analysis scriptsmanuscript: containing any write-ups of your resultsmaterials: containing relevant experiment materials (e.g., stimuli)
- letβs just create the first 2 (
dataandscripts)
data/
- do you have βrawβ, i.e., pre-processed data?
- if so, you might want to create a
rawsub-folder - and any other relevant sub-folders (e.g.,
processedortidy)
- if so, you might want to create a
- download the dataset from the workshop repo (from ChromΓ½ et al., 2023)
- or, move a dataset of your own to this folder
scripts/
- try to create a single script for each βproductβ
- e.g., anonymised data, βcleanedβ data, data exploration, visualisation, analyses, etc.
- you can create sub-folders as the project develops and move scripts around
- for now, letβs create a new script to take a look at our data
New script
Create a new Quarto script:
File > New File > Quarto Document- Add a title
- Uncheck the
Use Visual Editorbox - Click
Create - Save it in your
scripts/folder:File > Save as...
Load in the data
- load in the data however you normally would
- e.g.,
readr::read_csv()
- e.g.,
5 here-package
herepackage (MΓΌller, 2020) enables file referencing- avoids the use of
setwd()
- avoids the use of
5.1 The problem with setwd()
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")I will come into your office and SET YOUR COMPUTER ON FIREπ₯.
β Jenny Bryan
setwd()depends on your entire machineβs folder structuresetwd()breaks when you- send your project folder to a collaborator
- make your analyses open
- change the location of your project folder
- using slashes is also dependent on your operating system
5.2 The benefit of here()
- uses the top-level directory of your project as the working directory
- can separate folder names with a comma
here
Load the dataset using here
- Install
here(e.g.,install.packages("here")) - Load
hereat the beginning of your package- or use
here::before calling a function
- or use
- Use the
here()function to load in your data - Inspect the dataset however you usually would (e.g.,
summary(),names(), etc.) - Save your script
5.3 here::here()
- install package
In the Console
install.packages("here")- load package and call the
herefunction
# load package
library(here)
# read in data
df_data <- read.csv(here("data", "data_lifetime_pilot.csv"))- or directly call the
herefunction without loading the package
# read in data without loading here
df_data <- read.csv(here::here("data", "data_lifetime_pilot.csv"))- note that I stored the data with the prefix
df_dfstands for dataframe
- I recommend using object-type defining prefixes for all objects in your Environment
- e.g.,
fit_for models,fig_for figures,sum_for summaries,tbl_for tables, etc.
- e.g.,
Reproduce your analysis
- Perform some data exploration (e.g., with
names(),summary(),dplyr::glimpse(), whatever you typically do) - Save your script, then close RStudio/your Rproject.
- Re-open the project. Can you re-run the script?
Learning objectives π
Today we learnedβ¦
- learn about project-oriented workflows β
- create an RProject β
- establish a self-contained project environment with
hereβ
References
Bryan, J., & TAs, T. S. 545. (n.d.). R Basics and workflows. In STAT 545 Course materials. Retrieved May 6, 2024, from https://stat545.com/
ChromΓ½, J., Brand, J., Laurinavichyute, A., & Lacina, R. (2023). Number agreement attraction in Czech and English comprehension: A direct experimental comparison. Glossa Psycholinguistics, 2(1), 1β20. https://doi.org/10.5070/G6011235
MΓΌller, K. (2020). Here: A Simpler Way to Find Your Files (Version 1.0.1). https://CRAN.R-project.org/package=here
Wickham, H., Γetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). https://r4ds.hadley.nz/
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. " OβReilly Media, Inc.".